<!doctype html><html lang=zh-cn><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><title>Rust: A unique perspective</title><link rel=stylesheet href=https://note-2019-images.oss-cn-hangzhou.aliyuncs.com/notes.css media=all><script src=/static/main.js></script><body data-category=default data-clipid=1568279482><div class="mx-wc-main yue"><div><div><p><a href=https://www.rust-lang.org/>The Rust programming language</a> is designed to ensure memory safety,
using a mix of compile-time and run-time checks to stop programs from
accessing invalid pointers or sharing memory across threads without proper
synchronization.<p>The way Rust does this is usually introduced in terms of <strong>mutable</strong> and
<strong>immutable</strong> borrowing and lifetimes. This makes sense, because these are
mechanisms that Rust programmers must use directly. They describe <em>what</em> the
Rust compiler checks when it compiles a program.<p>However, there is another way to explain Rust. This alternate story focuses
on <strong>unique</strong> versus <strong>shared</strong> access to memory. I believe this
version is useful for understanding <em>why</em> various checks exist and <em>how</em> they
provide memory safety.<p>Most experienced Rust programmers are already familiar with this concept.
Five years ago, Niko Matsakis even proposed <a href=http://smallcultfollowing.com/babysteps/blog/2014/05/13/focusing-on-ownership/>changing the <code>mut</code> keyword to
<code>uniq</code></a> to emphasize it. My goal is to make these important
ideas more accesssible to beginning and intermediate Rust programmers.<p>This is a very quick introduction that skips over many details to focus on
high-level concepts. It should complement the official Rust documentation, not
supplant it.<h2>Unique access</h2><p>The first key observation is: <strong>If a variable has unique access to a value,
then it is safe to mutate it.</strong><p>By <em>safe</em>, I mean <em><span class="c1 ann">memory-safe</span></em><span class="c1 ann">: free from invalid pointer accesses, data races,
or other causes of </span><a href=https://doc.rust-lang.org/nomicon/what-unsafe-does.html><span class="c1 ann">undefined behavior</span></a><span class="c1 ann">. </span>And by <em><span class="c1 ann">unique access</span></em><span class="c1 ann">, I mean that
while this variable is alive, there are no other variables that can be used to
read or write any part of the same value.</span><p>Unique access makes memory safety very simple: If there are no other
pointers to the value, then you don’t need to worry about invalidating them.
Similarly, if variables on other threads can&#39;t access the value, you needn’t
worry about synchronization.<h3>Unique ownership</h3><p>One form of unique access is <strong>ownership</strong>. <span class="c1 ann">When you initialize a variable with
a value, it becomes the sole </span><em><span class="c1 ann">owner</span></em><span class="c1 ann"> of that value.</span> Because the value has
just one owner, the owner can safely mutate the value, destroy it, or
transfer it to a new owner.<p>Depending on the type of the value, assigning a value to a new variable
will either <strong>move</strong> it or <strong>copy</strong> it. Either way, unique ownership is
preserved. For a <em>move</em> type, the old owner becomes inaccessible after the
move, so we still have one value owned by one variable:<figure><pre><code class="language-rust hljs">let x = vec![1, 2, 3];
let y = x;             // move ownership from x to y
// can’t access x after moving its value to y</code></pre></figure><p>For a <em>copy</em> type, the value is duplicated, so we end up with two values owned
by two variables:<figure><pre><code class="language-rust hljs">let x = 1;
let y = x; // copy the value of x into y</code></pre></figure><p>In this case, each variable ends up with a separate, independent value.
Mutating one will not affect the other.<p><span class="c1 ann">One value might be owned by another value, rather than directly by a variable.</span>
For example, a struct owns its fields, a <code>Vec&lt;T&gt;</code> owns the <code>T</code> items inside
it, and a <code>Box&lt;T&gt;</code> owns the <code>T</code> that it points to.<h3>Unique borrowing</h3><p><span class="c1 ann">If you have unique access to a value of type </span><code><span class="c1 ann">T</span></code><span class="c1 ann">, you can borrow a </span><strong><span class="c1 ann">unique
reference</span></strong><span class="c1 ann"> to that value. </span>A unique reference to a <code>T</code> has type <code>&amp;mut T</code>.<p>Because it’s safe to mutate when you have a unique reference, unique
references are also called “mutable references.“<p>The Rust compiler enforces this uniqueness at compile time. In any region of
code where the unique reference may be used, no other reference to any part of
the same value may exist, and even the owner of that value may not move or
destroy it. Violating this rule triggers a compiler error.<p><span class="c1 ann">A reference only </span><strong><span class="c1 ann">borrows</span></strong><span class="c1 ann"> the value, and must return it to its owner.
</span>This means that the reference can be used to mutate the value, but not to move
or destroy it (unless it overwrites it with a new value, for example using
<a href=https://doc.rust-lang.org/std/mem/fn.replace.html><code>replace</code></a>). Just like in real life, you need to give back what you’ve
borrowed.<p>Borrowing a value is like locking it. Just like a mutex lock in a
multi-threaded program, it’s usually best to hold a borrowed reference for as
little time as possible. Storing a unique reference in a long-lived data
structure will prevent any other use of the value for as long as that
structure exists.<h3>Unique references can&#39;t be copied</h3><p>An <code>&amp;mut T</code> cannot be copied or cloned, because this would result in
two ”unique” references to the same value. It can only be moved:<figure><pre><code class="language-rust hljs">let mut a = 1;
let x = &amp;mut a;
let y = x; // move the reference from x into y
// x is no longer accessible here</code></pre></figure><p>However, <span class="c6 ann">you can temporarily ”<span class=ann>re-borrow</span>” from a unique reference</span>. This gives
a new unique reference to the same value, but the original reference can no
longer be accessed until the new one goes out of scope or is no longer used
(depending on which version of Rust you are using):<figure><pre><code class="language-rust hljs">let mut a = 1;
let x = &amp;mut a;
{
    let y = &amp;mut *x;
    // x is &#34;re-borrowed&#34; and cannot be used while y is alive
    *y = 4; // y has unique access and can mutate `a`
}
// x becomes accessible again after y is dead
*x += 1; // now x has unique access again and can mutate the value
assert_eq!(*x, 5);</code></pre></figure><p>Re-borrowing happens implicitly when you call a function that takes a unique
reference. This greatly simplifies code that passes unique references around,
but can confuse programmers who are just learning about these restrictions.<h2>Shared access</h2><p>A value is <strong>shared</strong> if there are multiple variables that are alive at the
same time that can be used to access it.<p>While a value is shared, we have to be a lot more careful about mutating it.
Writing to the value through one variable could invalidate pointers held by
other variables, or cause a data race with readers or writers on other
threads.<p>Rust ensures that <strong>you can read from a value only while no variables can
write to it</strong>, and <strong>you can write to a value only while no other variables
can read or write to it.</strong> In other words, you can have a unique writer, <em>or</em>
multiple readers, but not both at once. Some Rust types enforce this at
compile time and others at run time, but the principle is always the same.<h3>Shared ownership</h3><p>One way to share a value of type <code>T</code> is to create an <code>Rc&lt;T&gt;</code>, or
“reference-counted pointer to T”. This allocates space on the heap for a <code>T</code>,
plus some extra space for reference counting (tracking the number of pointers
to the value). Then you can call <code>Rc::clone</code> to increment the reference count
and receive another <code>Rc&lt;T&gt;</code> that points to the same value:<figure><pre><code class="language-rust hljs">let x = Rc::new(1);
let y = x.clone();
// x and y hold two different Rc that point to the same memory</code></pre></figure><p>Because the <code>T</code> lives on the heap and <code>x</code> and <code>y</code> just hold pointers to it, it
can outlive any particular pointer. It will be destroyed only when the last
of the pointers is dropped. This is called <strong>shared ownership</strong>.<h3>Shared borrowing</h3><p>Since <code>Rc&lt;T&gt;</code> doesn&#39;t have unique access to its <code>T</code>, it can’t give out a
unique <code>&amp;mut T</code> reference (unless it checks at run time that the reference
count is equal to 1, so it is not actually shared). But it <em>can</em> give out a
<strong>shared reference to T</strong>, whose type is written <code>&amp;T</code>. (This is also called
an “immutable reference.”)<p>A shared reference is another “borrowed” type which can’t outlive its
referent. The compiler ensures that a shared reference can’t be created while
a unique reference exists to any part of the same value, and vice-versa. And
(just like unique references) the owner isn’t allowed to drop/move/mutate the
value while any shared references are alive.<p>If you have unique access to a value, you can produce many shared references
or one unique reference to it. However, if you only have shared access to a
value, you can’t produce a unique reference (at least, not without some
additional checks, which I’ll discuss soon). One consequence of this is that
you can convert an <code>&amp;mut T</code> to an <code>&amp;T</code>, but not vice-versa.<p>Because multiple shared references are allowed, an <code>&amp;T</code> can be copied/cloned
(unlike <code>&amp;mut T</code>).<h2>Thread safety</h2><p>Astute readers might notice that merely cloning an <code>Rc&lt;T&gt;</code> mutates a value in
memory, since it modifies the reference count. This could cause a data race
if another clone of the <code>Rc</code> were accessed at the same time on a different
thread! The compiler solves this in typical Rust fashion: By refusing to
compile any program that passes an <code>Rc</code> to a different thread.<p>Rust has two built-in traits that it uses to mark types that can be accessed
safely by other threads:<ul><li><p><strong><code>T: Send</code></strong> means it&#39;s safe to access a <code>T</code> on a single other thread,
where one thread at a time has exclusive access. A value of this type
can be moved to another thread by unique ownership, or borrowed on another
thread by unique reference (<code>&amp;mut T</code>). A more descriptive name for this
trait might be <strong><code>UniqueThreadSafe</code></strong>.<li><p><strong><code>T: Sync</code></strong> means it’s safe for many threads to access a <code>T</code>
simultaneously, with each thread having shared access.
Values of such types can be accessed on other threads via shared ownership
or shared references (<code>&amp;T</code>). A more descriptive name would be
<strong><code>SharedThreadSafe</code></strong>.</ul><p><code>Rc&lt;T&gt;</code> implements neither of these traits, so an <code>Rc&lt;T&gt;</code> cannot be moved or
borrowed into a variable on a different thread. It is forever trapped on the
thread where it was born.<p>The standard library also offers an <code>Arc&lt;T&gt;</code> type, which is exactly like
<code>Rc&lt;T&gt;</code> except that it implements <code>Send</code>, and uses atomic operations to
synchronize access to its reference counts. This can make <code>Arc&lt;T&gt;</code> a little
more expensive at run time, but it allows multiple threads to share a value
safely.<p>These traits are not mutually exclusive. Many types are both <code>Send</code> and
<code>Sync</code>, meaning that it’s safe to give unique access to one other thread (for
example, moving the value itself or sending an <code>&amp;mut T</code> reference) <em>or</em> shared
access to many threads (for example, sending multiple <code>Arc&lt;T&gt;</code> or <code>&amp;T</code>).<h2>Shared mutability</h2><p>So far, we’ve seen that sharing is safe when values are not mutated, and
mutation is safe when values are not shared. But what if we want to share
<em>and</em> mutate a value? The Rust standard library provides several different
mechanisms for <strong>shared mutability</strong>.<p>The official documentation also calls this “interior mutability” because it
lets you mutate a value that is “inside” of an immutable value. This
terminology can be confusing: What does it mean for the exterior to be
“immutable” if its interior is mutable? I prefer “shared mutability” which
puts the spotlight on a different question: How can you safely mutate a value
while it is shared?<h3>What could go wrong?</h3><p>What’s the big deal about shared mutation? Let’s start by listing some of the
ways it could go wrong:<p>First, mutating a value can cause <strong>pointer invalidation</strong>. For example,
pushing to a vector might cause it to reallocate its buffer. If there are
other variables that contained addresses of items in the buffer, they would
now point to deallocated memory. Or, mutating an enum might overwrite a
value of one type with a value of a different type. A pointer to the old
value will now be pointing at memory occupied by the wrong type. Either of
these cases would trigger undefined behavior.<p>Second, it could violate <strong>aliasing assumptions</strong>. For example, the optimizing
compiler assumes by default that the referent of an <code>&amp;T</code> reference will not
change while the reference exists. It might re-order code based on this
assumption, leading to undefined behavior when the assumption is violated.<p>Third, if one thread mutates a value at the same time that another thread is
accessing it, this causes a <strong>data race</strong> unless both threads use
<a href=https://doc.rust-lang.org/std/sync/>synchronization</a> primitives to prevent their operations from overlapping.
Data races can cause arbitrary undefined behavior (in part because data races
can also violate assumptions made by the optimizer during code generation).<h3>UnsafeCell</h3><p>To fix the problem of aliasing assumptions, we need <a href=https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html><code>UnsafeCell&lt;T&gt;</code></a>. The
compiler knows about this type and treats it specially: It tells the optimizer
that the value inside an <code>UnsafeCell</code> is not subject to the usual restrictions
on aliasing.<p>Safe Rust code doesn’t use <code>UnsafeCell</code> directly. Instead, it’s used by
libraries (including the standard library) that provide APIs for <em>safe</em> shared
mutability. All of the shared mutable types discussed in the following
sections use <code>UnsafeCell</code> internally.<p><code>UnsafeCell</code> solves only one of the three problems listed above. Next, we&#39;ll
see some ways to solve the other two problems: pointer invalidation and data
races.<h3>Multi-threaded shared mutability</h3><p>Rust programs can safely mutate a value that’s shared across threads, as long
as the basic rules of unique and shared access are enforced: Only one thread
at a time may have unique access to a value, and only this thread can mutate
it. When no thread has unique access, then many threads may have shared
access, but the value can’t be mutated while they do.<p>Rust has two main types that allow thread-safe shared mutation:<ul><li><p><strong><code>Mutex&lt;T&gt;</code></strong> allows one thread at a time to “lock” a mutex and get unique
access to its contents. If a second thread tries to lock the mutex at the
same time, the second thread will block until the first thread unlocks it.
Since <code>Mutex</code> provides access to only one thread at a time, it can be used to
share any type that implements the <code>Send</code> (“unique thread-safe”) trait.<li><p><strong><code>RwLock&lt;T&gt;</code></strong> is similar but has two different types of lock: A “write”
lock that provides unique access, and a “read” lock that provides shared
access. It will allow many threads to hold read locks at the same time, but
only one thread can hold a write lock. If one thread tries to write while
other threads are reading (or vice-versa), it will block until the other
threads release their locks. Since <code>RwLock</code> provides both unique and shared
access, its contents must implement both <code>Send</code> (“unique thread-safe”) and
<code>Sync</code> (“shared thread-safe”).</ul><p>These types prevent pointer invalidation by using run-time checks to enforce
the rules of unique and shared borrowing. They prevent data races by using
synchronization primitives provided by the platform’s native threading system.<p>In addition, various <strong><a href=https://doc.rust-lang.org/std/sync/atomic/>atomic types</a></strong> allow safe shared mutation of
individual primitive values. These prevent data races by using compiler
intrinsics that provide synchronized operations, and they prevent pointer
invalidation by refusing to give out references to their contents; you can
only read from them or write to them by value.<p>All these types are only useful when shared by multiple threads, so they are
often used in combination with <code>Arc</code>. Because <code>Arc</code> lets multiple threads
share ownership of a value, it works with threads that might outlive the
function that spawns them (and therefore can’t borrow references from it).
However, <a href=https://docs.rs/crossbeam/0.3.2/crossbeam/struct.Scope.html#method.spawn>scoped threads</a> are guaranteed to terminate before their spawning
function, so they can capture shared references like <code>&amp;Mutex&lt;T&gt;</code> instead of
<code>Arc&lt;Mutex&lt;T&gt;&gt;</code>.<h3>Single-threaded shared mutability</h3><p>The standard library also has two types that allow safe shared mutation
within a single thread. These types don’t implement the <code>Sync</code> trait, so the
compiler won&#39;t let you share them across multiple threads. This neatly avoids
data races, and also means that these types don’t need atomic operations
(which are potentially expensive).<ul><li><p><strong><code>Cell&lt;T&gt;</code></strong> solves the problem of pointer invalidation by forbidding
pointers to its contents. Like the atomic types mentioned above, you can only
read from it or write to it by value. Changing the data “inside” of the
<code>Cell&lt;T&gt;</code> is okay, because there are no shared pointers to that data – only to
the <code>Cell&lt;T&gt;</code> itself, whose type and address do not change when you mutate its
interior. (Now we see why “interior mutability” is also a useful concept.)<li><p>Many Rust types are useless without references, so Cell is often too
restrictive. <strong><code>RefCell&lt;T&gt;</code></strong> allows you to borrow either unique or shared
references to its contents, but it keeps count of how many borrowers are alive
at a time. Like <code>RwLock</code>, it allows one unique reference or many shared
references, but not both at once. It enforces this rule using run-time
checks. (But since it’s used within a single thread, it can’t block the
thread while waiting for other borrowers to finish. Instead, it panics
if a program violates its borrowing rules.)</ul><p>These types are often used in combination with <code>Rc&lt;T&gt;</code>, so that a value shared
by multiple owners can still be mutated safely. They may also be used for
mutating values behind shared references. The <a href=https://doc.rust-lang.org/std/cell/><code>std::cell</code></a> docs have some
examples.<h2>Summary</h2><p>To summarize some key ideas:<ul><li>Rust has two types of references: unique and shared.<li>Unique mutable access is easy.<li>Shared immutable access is easy.<li>Shared mutable access is hard.<li>This is true for both single-threaded and multi-threaded programs.</ul><p>We also saw a couple of ways to classify Rust types. Here’s a table showing
some of the most common types according to this classification scheme:<table><tbody><tr><td><th>Unique<th>Shared<tr><th>Borrowed<td><code>&amp;mut T</code><td><code>&amp;T</code><tr><th>Owned<td><code>T, Box&lt;T&gt;</code><td><code>Rc&lt;T&gt;</code>, <code>Arc&lt;T&gt;</code></table><p>I hope that thinking of these types in terms of uniqueness and sharing will
help you understand how and why they work, as it helped me.<h2>Want to know more?</h2><p>As I said at the start, this is just a quick introduction and glosses over
many details. The exact rules about unique and shared access in Rust are
still being worked out. The <a href=https://doc.rust-lang.org/nomicon/aliasing.html>Aliasing</a> chapter of the Rustonomicon explains
more, and Ralf Jung’s <a href=https://www.ralfj.de/blog/2018/11/16/stacked-borrows-implementation.html>Stacked Borrows</a> model is the start of a more complete
and formal definition of the rules.<p>If you want to know more about how shared mutability can lead to
memory-unsafety, read <a href=https://manishearth.github.io/blog/2015/05/17/the-problem-with-shared-mutability/>The Problem With Single-threaded Shared Mutability</a> by
Manish Goregaokar.<p>The Swift language has an approach to memory safety that is similar in some
ways, though its exact mechanisms are different. You might be interested in
its recently-introduced <a href=https://swift.org/blog/swift-5-exclusivity/>Exclusivity Enforcement</a> feature, and the <a href=https://github.com/apple/swift/blob/fa952d398611e9a2b97531e2ac3efb6c36e9ba98/docs/OwnershipManifesto.md>Ownership
Manifesto</a> that originally described its design and rationale.</div></div><hr><div><label>原网址: <a href=https://limpet.net/mbrubeck/2019/02/07/rust-a-unique-perspective.html>访问</a></label><br><label>创建于: 2019-09-12 17:11:22</label><br><label>目录: default</label><br><label>标签: <code>rust</code></label></div></div><script src=https://note-2019-images.oss-cn-hangzhou.aliyuncs.com/highlight.pack.js></script><link rel=stylesheet href=https://note-2019-images.oss-cn-hangzhou.aliyuncs.com/highlight.vs.css><script>
        hljs.initHighlightingOnLoad();
        document.querySelectorAll('pre.hljs').forEach((block) => {
            hljs.highlightBlock(block);
        });
    </script>